Chapter 15

Introducing Correlation and Regression

IN THIS CHAPTER

Getting a handle on correlation analysis

Understanding the many kinds of regression analysis

Correlation, regression, curve-fitting, model-building — these terms all describe a set of general

statistical techniques that deal with the relationships among variables. Introductory statistics courses

usually present only the simplest form of correlation and regression, equivalent to fitting a straight line

to a set of data. But in the real world, correlations and regressions are seldom that simple — statistical

problems may involve more than two variables, and the relationship among them can be quite

complicated.

The words correlation and regression are often used interchangeably, but they refer to two

different concepts:

Correlation refers to the strength and direction of the relationship between two variables, or

among a group of variables.

Regression refers to a set of techniques for describing how the values of a variable or a group of

variables may cause, predict, or be associated with the values of another variable.

You can study correlation and regression for many years and not master all of it. In this chapter, we

cover the kinds of correlation and regression most often encountered in biological research and

explain the differences between them. We also explain some terminology used throughout Parts 5 and

6.

Correlation: Estimating How Strongly Two

Variables Are Associated

Correlation refers to the extent to which two variables are related. In the following sections, we

describe the Pearson correlation coefficient and discuss ways to analyze correlation coefficients.

Lining up the Pearson correlation coefficient

The Pearson correlation coefficient is represented by the symbol r and measures the extent to

which two variables (X and Y) tend to lie along a straight line when graphed. If the variables have